Do not parse URIs during LSP serialization/deserialization #76691

dibarbet · 2025-01-09T02:45:56Z

Problem

System.Uri will throw attempting to parse URIs that have sub-delims in the host name. This is because it does additional host name validation beyond what is defined in the URI RFC spec. See above linked issues for specific examples.

When we get passed these URIs from VSCode, the server crashes because we cannot successfully deserialize the LSP URI string into a System.Uri. We are unable to easily recover from these as it happens before we get to the queue processing.

Solution

The approach I took in this PR is to stop parsing URIs during LSP message deserialization and remove usages of System.Uri from LSP directly. Instead, I created a wrapper type DocumentUri which initially stores just the string representation of the URI (exactly how LSP defines URIs). The System.Uri can be optionally retrieved from this (which will parse it). Only places which need to extract information from the URI should retrieve the System.Uri and must handle failures.

This allows URI parsing to be delayed until much later (for example looking at the scheme or file path to find a matching document in the workspace). While we still cannot parse the URI, we can handle the error and provide a misc document instead of crashing the server. These aren't normal files anyway (they don't conform to the file:///<path> format) and would go to misc regardless of if we succeed in parsing it or not.

If at some point the runtime adds a new parsing mode, or we switch to our own URI parser (similar to O#), this still provides value. Deserialization of the URI string and parsing the URI string are two separate logical operations and should not be tied together. Additionally, any runtime improvements here would only be available in .NET 10.

Key areas to look at

DocumentUri
DocumentUriConverter
LspWorkspaceManager
ProtocolConversions

Razor PR - https://github.com/dotnet/razor/pull/11390/files

CyrusNajmabadi · 2025-01-14T23:24:47Z

src/LanguageServer/Protocol/Protocol/Converters/DocumentUriConverter.cs

-/// TODO: document.
+/// Converts the LSP spec URI string into our custom wrapper for URI strings.
+/// We do not convert directly to <see cref="System.Uri"/> as it is unable to handle
+/// certain valid RFC spec URIs.  We do not want serialization / deserialization to fail if we cannot parse the URI.


consider linking to the runtime issue.

dibarbet · 2025-01-14T23:25:50Z

src/LanguageServer/ProtocolUnitTests/UriTests.cs

+    // Valid URI, but System.Uri cannot parse it.
+    [InlineData(true, "perforce://@=1454483/some/file/here/source.cs")]
+    [InlineData(false, "perforce://@=1454483/some/file/here/source.cs")]
+    public async Task TestOpenDocumentWithInvalidUri(bool mutatingLspWorkspace, string uriString)


new tests for these URIs

CyrusNajmabadi · 2025-01-14T23:26:14Z

src/LanguageServer/Protocol/Protocol/DocumentUri.cs

+        // If either of the URIs cannot be parsed, we'll compare the original URI strings.
+        if (otherUri.ParsedUri is null || this.ParsedUri is null)
+        {
+            return this.UriString == otherUri.UriString;


this must be false, right? because we did this exact check above, and would have returned 'true' if it succeeded. so why not return false here?

yup, that should definitely return false. good catch

dibarbet · 2025-01-14T23:26:45Z

src/LanguageServer/Protocol/Protocol/DocumentUri.cs

+/// In order to gracefully handle these issues, we defer the parsing of the URI until someone
+/// actually asks for it (and can handle the failure).
+/// </remarks>
+internal sealed class DocumentUri : IEquatable<DocumentUri>


the main new wrapper type

dibarbet · 2025-01-14T23:27:10Z

src/LanguageServer/Protocol/Protocol/Converters/DocumentUriConverter.cs

 /// </summary>
-internal class DocumentUriConverter : JsonConverter<Uri>
+internal class DocumentUriConverter : JsonConverter<DocumentUri>


new serializer for new wrapper type

dibarbet · 2025-01-14T23:28:13Z

src/LanguageServer/Protocol/Workspaces/LspWorkspaceManager.cs

-    private class LspUriComparer : IEqualityComparer<Uri>
-    {
-        public static readonly LspUriComparer Instance = new();
-        public bool Equals(Uri? x, Uri? y)


moved to the new type

dibarbet · 2025-01-14T23:28:32Z

src/LanguageServer/Protocol/Workspaces/LspWorkspaceManager.cs

@@ -112,7 +62,7 @@ public int GetHashCode(Uri obj)
    /// the URI.
    /// <para/> Access to this is guaranteed to be serial by the <see cref="RequestExecutionQueue{RequestContextType}"/>
    /// </summary>
-    private ImmutableDictionary<Uri, (SourceText Text, string LanguageId)> _trackedDocuments = ImmutableDictionary<Uri, (SourceText, string)>.Empty.WithComparers(LspUriComparer.Instance);
+    private ImmutableDictionary<DocumentUri, (SourceText Text, string LanguageId)> _trackedDocuments = ImmutableDictionary<DocumentUri, (SourceText SourceText, string LanguageId)>.Empty;


rest of this file is interesting - contains changes to attempt to parse and handle issues

src/LanguageServer/Protocol/RoslynLanguageServer.cs

davidwengier · 2025-01-15T02:00:16Z

src/Tools/ExternalAccess/Razor/RazorUri.cs


 namespace Microsoft.CodeAnalysis.ExternalAccess.Razor;

 internal static class RazorUri
 {
+    [Obsolete("Use RazorUri.GetUriFromFilePath instead")]


Nit: Shouldn't this tell people to use CreateAbsoluteDocumentUri?

Although, if changing to DocumentUri is a breaking change for Razor, is there a reason to keep this obsolete at all?

So changing to documenturi doesn't seem to break non-cohosted razor. I'm not as sure about this one

davidwengier · 2025-01-15T02:06:26Z

src/LanguageServer/Protocol/Protocol/DocumentUri.cs

+        if (this.ParsedUri is null)
+        {
+            // We can't do anything better than the uri string hash code if we cannot parse the URI.
+            return this.UriString.GetHashCode();


I don't think I understand why GetHashCode shouldn't only look at UriString, and let the Equals method do the rest.

It's perhaps not a real world problem so take this with a grain of salt, but I think GetHashCode looking at the parsed Uri means a dictionary/hashset operation can have different results for the same Uri, depending on whether someone happens to have called ParsedUri yet. If GetHashCode is simpler, then things would collide, and the Equals method would save the day.

Potentially broken example (untested, obv):

var uri = "http://goo"; var documentUri1 = new DocumentUri(uri); var documentUri2 = new DocumentUri(uri); documentUri2.GetRequiredParsedUri(); var dict = new Dictionary<DocumentUri, object>(); dict.Add(documentUri1, new GiantObject()); Assert.True(dict.ContainsKey(documentUri2));

Accessing ParsedUri will cause the URI to attempt to be parsed (which gethashcode does). So 1 and 2 should have the same hash code (but I will verify as well).

If GetHashCode is simpler, then things would collide, and the Equals method would save the day.

In this case, the issue is two different URI strings (with different string hashcodes) should be considered equal. For example an encoded vs. unencoded URI that point to the same file path should be considered equal (as System.Uri does).

There'd be no collision here because the hashcodes would be different, hence the need to look at both.

Ahh I think maybe I read the null check on ParsedUri and forgot that would mean it would cause a parse.

dibarbet · 2025-01-21T23:38:20Z

As for the pause here - I need to update XAML with these changes, but have been working on infra. Will get back to this shortly

dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Jan 9, 2025

dotnet-policy-service bot added the VSCode label Jan 9, 2025

dibarbet force-pushed the uri_crash branch 9 times, most recently from a6bf829 to e0e73bb Compare January 14, 2025 00:28

Do not parse URIs during LSP serialization/deserialization

2a3ca24

dibarbet force-pushed the uri_crash branch from e0e73bb to 2a3ca24 Compare January 14, 2025 23:21

CyrusNajmabadi reviewed Jan 14, 2025

View reviewed changes

dibarbet commented Jan 14, 2025

View reviewed changes

CyrusNajmabadi reviewed Jan 14, 2025

View reviewed changes

dibarbet commented Jan 14, 2025

View reviewed changes

CyrusNajmabadi approved these changes Jan 14, 2025

View reviewed changes

dibarbet marked this pull request as ready for review January 14, 2025 23:53

dibarbet requested a review from a team as a code owner January 14, 2025 23:53

Review feedback

5472092

JoeRobich approved these changes Jan 15, 2025

View reviewed changes

src/LanguageServer/Protocol/RoslynLanguageServer.cs Outdated Show resolved Hide resolved

JoeRobich approved these changes Jan 15, 2025

View reviewed changes

additional feedback

6ca67b9

davidwengier reviewed Jan 15, 2025

View reviewed changes

jasonmalinowski self-requested a review January 15, 2025 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not parse URIs during LSP serialization/deserialization #76691

Do not parse URIs during LSP serialization/deserialization #76691

dibarbet commented Jan 9, 2025 •

edited

Loading

CyrusNajmabadi Jan 14, 2025

dibarbet Jan 14, 2025

CyrusNajmabadi Jan 14, 2025

dibarbet Jan 14, 2025

dibarbet Jan 14, 2025

dibarbet Jan 14, 2025

dibarbet Jan 14, 2025

dibarbet Jan 14, 2025

davidwengier Jan 15, 2025

dibarbet Jan 21, 2025

davidwengier Jan 15, 2025

dibarbet Jan 21, 2025

davidwengier Jan 22, 2025

dibarbet commented Jan 21, 2025

Do not parse URIs during LSP serialization/deserialization #76691

Are you sure you want to change the base?

Do not parse URIs during LSP serialization/deserialization #76691

Conversation

dibarbet commented Jan 9, 2025 • edited Loading

Problem

Solution

Key areas to look at

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dibarbet commented Jan 21, 2025

dibarbet commented Jan 9, 2025 •

edited

Loading